36 research outputs found

    Effective early termination techniques for text similarity join operator

    Get PDF
    Bu çalışma, 26-28 Ekim 2005 tarihleri arasında İstanbul[Türkiye]'da düzenlenen 20. International Symposium on Computer and Information Sciences'da bildiri olarak sunulmuştur.Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.Inst Elec & Elect Engineers, Turkey SectBoğaziçi Üniversites

    Using covariates for improving the minimum redundancy maximum relevance feature selection method

    Get PDF
    Maximizing the joint dependency with a minimum size of variables is generally the main task of feature selection. For obtaining a minimal subset, while trying to maximize the joint dependency with the target variable, the redundancy among selected variables must be reduced to a minimum. In this paper, we propose a method based on recently popular minimum Redundancy-Maximum Relevance (mRMR) criterion. The experimental results show that instead of feeding the features themselves into mRMR, feeding the covariates improves the feature selection capability and provides more expressive variable subsets

    Biomedical image time series registration with particle filtering (Parçacık süzgeci ile biyomedikal görüntü zaman serisi çakıştırma)

    Get PDF
    We propose a family of methods for biomedical image time series registration based on Particle filtering. The first method applies an intensity-based information-theoretic approach to calculate importance weights. An effective second group of methods use landmark-based approaches for the same purpose by automatically detecting intensity maxima or SIFT interest points from image time series. A brute-force search for the best alignment usually produces good results with proper cost functions, but becomes computationally expensive if the whole search space is explored. Hill climbing optimizations seek local optima. Particle filtering avoids local solutions by introducing randomness and sequentially updating the posterior distribution representing probable solutions. Thus, it can be more robust for the registration of image time series. We show promising preliminary results on dendrite image time series

    Intelligent data analysis to interpret major risk factors for diabetic patients with and without ischemic stroke in a small population

    Get PDF
    This study proposes an intelligent data analysis approach to investigate and interpret the distinctive factors of diabetes mellitus patients with and without ischemic (non-embolic type) stroke in a small population. The database consists of a total of 16 features collected from 44 diabetic patients. Features include age, gender, duration of diabetes, cholesterol, high density lipoprotein, triglyceride levels, neuropathy, nephropathy, retinopathy, peripheral vascular disease, myocardial infarction rate, glucose level, medication and blood pressure. Metric and non-metric features are distinguished. First, the mean and covariance of the data are estimated and the correlated components are observed. Second, major components are extracted by principal component analysis. Finally, as common examples of local and global classification approach, a k-nearest neighbor and a high-degree polynomial classifier such as multilayer perceptron are employed for classification with all the components and major components case. Macrovascular changes emerged as the principal distinctive factors of ischemic-stroke in diabetes mellitus. Microvascular changes were generally ineffective discriminators. Recommendations were made according to the rules of evidence-based medicine. Briefly, this case study, based on a small population, supports theories of stroke in diabetes mellitus patients and also concludes that the use of intelligent data analysis improves personalized preventive intervention

    The Decision of Intrauerine Growth Retardation from Ultrasonographic Examinations with Neural Networks

    No full text
    Our putpose is to make decision of intrauterine growth retardation (IUGR) through single and multiple ultrasonographic fetal growth assessments using a neural network (NN). This study was undertaken to show if a feedforward NN can learn nominal growth curves of head circumference (HC), abdominal circumference (AC), and HC/AC ratio versus gestational age and can help doctors in diagnosis ofIUGR Weekly (from 1 to 4 weeks) ultrasonographic examinations are taken as input to NN. A feedforward NN is used as a function approximator. Back propagation (BP) algorithm is used to optimize connection weights using samples from nominal curves. It was observed that a NN can improve the accuracy of the decision of IUGR by the multiple weekly examinations which mean monitoring the dynamic process of a change in size over time. It was concluded that the applicability of NNs to determination of IUGR is possible and it is a fruitfui line of inquiry for further work

    Discrimination Ability of Time-Domain Features and Rules for Arrhythmia Classification

    No full text
    This study investigates relevant diagnosis information for arrhythmia classification from previously collected cardiac data. Discrimination ability of various time-domain attributes and rules were discussed for automatic diagnosis of arrythmia using electrocardiogram (ECG) signals. Naive Bayes, C4.5, multilayer perceptron (MLP) and support vector machines (SVM) algorithms were tested on a number of the input features selected by correlative feature selection (CFS) method. Hot Spot algorithm was employed to extract a number of rules that is useful in diagnosing cardiac problems from ECG signal. 257 time domain features of 452 cases from a cardiac arrhythmia database [1] were used. Various testing configurations and performance measures such as accuracy, TP and FP rates, precision, recall and AUC were considered. The discrimination ability of selected-features and the extracted-rules were demonstrated
    corecore